DP Application - CrowdSourcing

[2015-Layla] is a survey on the MobileCrowdScoucing privacy.

Four factors in CrowdSourcing:

  1. sensing data quality, which tries to maximize the data quality measured by a certain metric (mostly used in environmental monitoring tasks)
  2. incentive cost, which aims at minimizing the total budget (from the task organizer perspective)
    for an MCS task with different incentive mechanisms, such as pay per participant or pay per task
  3. energy consumption, whose objective is to identify an optimal collaborative data sensing and uploading scheme with energy-saving techniques such as piggybacking
  4. travel distance, where the travel distance of a user for accomplishing a task is considered in task allocation, in order to minimize the overall travel distance for all the tasks.

[2018-Wang Jiangtao] provides the comprehensive survey on MCS(MobileCrowdScoucing) task allocation.

Task Assignments in CrowdSourcing

In croudsourcing, workers with mobile devices to collect data and send it to task requester for rewards.

In task assignment, organizers need participants’ precise locations for optimal task allocation. However, the exposure of their locations raises privacy concerns. Especially for those who are not eventually selected for any task, their location privacy is sacrificed in vain.

So in the differential privacy task assignment croudsourcing, the goal is to design a data release method that accurately represents the distribution of the workers and helps the Server efficiently match workers with tasks without compromising the privacy of their locations.

Solutions

[2014-To]

it is the first one to solve this problem.

  1. Framework

    • Workers send their locations to a trusted cellular service provider(CSP)
    • CSP collects updates and releases a PSD according to privacy budget 
    • When the SC-server receives a task t, it queries the PSD to determine a geocast region (GR), which is a unique feature of this work. Next, the SC-server initiates a geocast communication process to disseminate t to all workers within GR. Upon receiving request t, a worker w decides whether to perform the task or not.
    • Task assignment.
      • Once server get a task request, server needs to query PSD to find a geocast region, balancing between high task assignment success rate and system overload like worker traveling distance and the number of noticed worker. The author calculate maximum travel distance, models acceptance rate as the function of distance.
      • Geocast region construction.
      • Optimization. Including Partial Cell Selection and Communication Cost.
  2. Privacy Model

    • Privacy leakage: (1) workers disclose information to the task requester once they consent to the task; (2) completion of a task discloses the fact that some worker must have been at that location; (3) but this paper focuses on what happens prior to consent, when worker location and identity must be protected from both task requesters and the SC server.

    • the specific objective is to protect both the location and the identity of workers during task assignment.

    • Private Spatial Decom

      • AG uses a two-level grid and variable cell granularity.

      • For the first-level, domain is divided into $m_1\times{m1}$ cells. This heuristic method is data-independent, and thus does not consume any privacy budget. For each level-1 cell, it is divided into $m_2\times{m2}$ subcells.

      • $k_2$ selection. $k_2$ controls the granularity of level-2 domain and small one leads to compactness in level-2 subcells.

      • improving $m_2$

        Screen Shot 2018-05-30 at 12.27.05 PM

[2018-Yang]

it criticizes that [2014-To] is based on the assumption that workers are uniformly distributed within the domain and and the workers in each cell have the same acceptance rate. So they proposes a workers density-based method.

  1. Framework
    • Workers must submit their location to the CSP, travel to the location designated for the task and collect data using their sensor-equipped device.
    • The CSP collects locations from workers and releases data in sanitized form to the Server for task assignment. There is a trust relationship between CSP and workers.
    • The Server queries the CSP for a sanitized dataset once it receives a task, where server chooses a geocast region GR to disseminate the task to the workers in GR. It then assigns the task to suitable workers, through the CSP, according to a task assignment algorithm.
  2. Modules
    • Using quadtree to partition the region based on worker density. But the authors change this method so that the partitioning is based on the worker density instead of choosing the middle point.
    • Partitioning point selection. Before partitioning, m initial points are randomly generated within the cell, ==============================(TODO)
    • Differential privacy data release. A noisy count of the number of workers in each cell is released to protect the privacy of worker locations, where whether or not a worker
      within a specific cell cannot be identified.
    • Task assignment. Firstly, teh geocast region is selected based on task assignment success rate and system overhead (the distance workers need to travel and the number of workers notified of the task).
  3. Privacy Model
    • The basic idea of private data release is that the domain of worker locations is partitioned into small cells and Laplace noise is added to the count of workers in each cell to achieve a differential privacy guarantee.
    • Pervious literature assumes the worker locations are distributed uniformly, and the workers in each cell have the same acceptance rate, which is not the case in real-world scenarios. Partitioning the data domain into a uniform grid would result in sizeable errors. Therefore, we propose a recursive partitioning process based on worker density.
    • The aim is to identify dense regions and sparse regions and make the distribution of the workers in each smaller region as near to uniform as possible.
    • Adopting data-independent quadtree into workers density-based data-independent technique.
      • Partitioning stop condition. Traditional quadtrees require the data publisher to specify the height of the partitioning. In this paper, the process stops if (1) no workers exist in the cell; (2) the area of cell is less than some threshold, The smaller the cell, the more uniform the distribution of workers within it; (3) the distribution of workers in a cell is relatively uniform, which is measures by the threshold of maximum density difference.
      • First, $m$ initial partition points in the location domain need to be selected, where $m=\frac{\sqrt{area \ of \ cell}}{\alpha}$
      • check condition (1) and (2), which is the number of points in cell is greater than 0 and the area of cell is greater than the threshold. If one of answers is no, stop partitioning.
      • for each partitioning point, calculate the density of the subcells divided by this point. choose the point which has the maximum density difference bbetween subcells.
      • If the biggest density difference is greater than the threshold β, the cell is partitioned at point. Otherwise, the cell will not be partitioned as the distribution of worker in the cell is already close to uniform. For example, the density of four subcells is 1, 2, 1, 2, which means in each subcells, there are 1, 2, 1 and 2 points in each subcell, which is kind of uniform distribution in the cell.

[2017-Wang]

  1. Framework

    • Platform-side : Geo-Obfuscation Function Generation
    • User-side : Location Obfuscation
    • Platform-side : Obfuscation-aware Task Allocation
  2. Privacy Model

    Overall, the aurhor models workers’ travel distance to task locations as the function of geo-obfuscation function and task allocation. And by calculating the optimal function, we can get the Geo-Obfuscation matrix which satisfies DP and task allocation schemes.

    Screen Shot 2018-05-30 at 10.06.41 AM

    • the expected travel distance of assigning a task at $l_t$ to a user at (obfuscated) $l*$ given the geo-obfuscation function $P$.

      Screen Shot 2018-05-30 at 10.18.25 AM

    • optimal function

      Screen Shot 2018-05-30 at 10.55.21 AM

      Screen Shot 2018-05-30 at 11.05.55 AM

      Screen Shot 2018-05-30 at 10.56.24 AM

    • task allocation

      Screen Shot 2018-05-30 at 11.07.06 AM

    • Candidate Geo-Distribution Estimation

      Screen Shot 2018-05-30 at 11.10.54 AM

    • Laplace

      Screen Shot 2018-05-30 at 11.27.15 AM

    • Geo-distribution Estimation

      Screen Shot 2018-05-30 at 11.50.49 AM

      Screen Shot 2018-05-30 at 11.50.57 AM

      Screen Shot 2018-05-30 at 11.51.05 AM

[2015-Gong]

[2016-Jin]

This paper considers how to effectively incentivize worker participation. And proposes a system framework that integrates an incentive, a weighted data aggregation, and a data perturbation mechanism which protects the sensing data.

  1. Framework

    Screen Shot 2018-06-01 at 11.00.21 AM

    • Aggregation Mechanism

      To guarantee that the perturbed results have satisfactory accuracy, the original aggregated results before perturbation need to be accurate enough in the first place. Therefore, we reasonably assume that the platform uses a weighted aggregation method to calculate the aggregated result $x_j$ for each task $\tau_j$ based on workers’ data.

      The motivation for utilizing weighted aggregation is to capture the effect of workers’ diverse skill levels on the calculation of the aggregated results.

    • Incentive Mechanism

      aim to design a pSRC auction that minimizes the platform’s total payment with satisfactory data aggregation

      Screen Shot 2018-06-01 at 11.51.45 AM

      where $y_i$ means worker is seleted for a task, $\alpha_j$ is the error threshold for task j, $\theta_{i,j}$ is error of worker i for task j.

  2. Privacy model

    • workers locally sense a specific object or phenomenon and server aggregrate the uploaded data and publish a noise aggregated results to protect worker’s privacy.

    • the noise scale is deviated from the $(\alpha,\beta)-accuracy$.

    • Screen Shot 2018-06-01 at 12.09.47 PM-7873006

    • Screen Shot 2018-06-01 at 12.25.25 PM

    • Screen Shot 2018-06-01 at 12.19.28 PM

      Remarks:

      $Pr(|N_j|\ge\alpha_j)=Pr(N_j\ge\alpha_j)+Pr(-N_j\le-\alpha_j)$

      according to the symmetry of PDF, $Pr(N_j\ge\alpha_j)=Pr(-N_j\le-\alpha_j)$

      so $Pr(|N_j|\ge\alpha_j)=2Pr(N_j\ge\alpha_j)$

Data Sensing in CrowdSourcing

Solutions

[2016-Wang]

In this paper, instead of studying task assignment, the author focuses on sparse croudsensing. Due to large target sensing area and limited budget which result in insufficient spatial coverage of mobile users, sparse mobile croudsensing impute information of the uncovered regions by combining historical records with available sensing data from nearby regions.

  1. Framework
    • The server side generates probabilistic obfuscation matrix and data adjustment function in an offline way.
    • user sider senses its actual location and then maps the associated region to another region according to probabilistic obfuscation matrix. After that, the data adjustment function alters the original sensing data to fit the properties of the obfuscated region. Finally, mobile client then uploads the modified region and data to the server.
    • The server side does data inference : modeled as a matrix completion problem, where each element in the matrix is the value like temperature of a region at time t. In this paper, they use compressive sensing theory for inference.
  2. Privacy Model

    • Privacy leakage : In Sparse MCS, participants report the sensing data with time stamps and geographical coordinates, which may introduce serious privacy risks.

    • Data utility: Due to location obfuscation, the uploaded region data is not actual value of this region, so adjustment is needed to decrease data utility loss. the data quality loss is determined by the difference of sensing data between the actual and the obfuscated locations, instead of the geographic distance. In other words, a participant’s location may be mapped to a place far away, as long as the sensing values of the two locations are close enough.

    • Adversary model - Bayesian attack

      Screen Shot 2018-05-30 at 12.10.00 PM

      Screen Shot 2018-05-30 at 12.10.48 PM

      Screen Shot 2018-05-30 at 12.11.49 PM

    • optimal function

      The authors model the sensing data quality loss as the function of location obfuscation matrix.

      Screen Shot 2018-05-30 at 12.20.20 PM

      Screen Shot 2018-05-30 at 12.22.12 PM

      Screen Shot 2018-05-30 at 12.22.56 PM

Reference

[2014-To] A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing

[2018-Yang] Density-Based Location Preservation for Mobile Crowdsensing With Differential Privacy

[2016-Wang] Differential Location Privacy for Sparse Mobile Crowdsensing

[2015-Layla] Participant privacy in mobile crowd sensing task management: A survey of methods and challenges

[2015-Gong] Protecting Location Privacy for Task Allocation in Ad Hoc Mobile Cloud Computing

[2016-Jin] INCEPTION: Incentivizing Privacy-Preserving Data Aggregation for Mobile Crowd Sensing Systems

[2018-Wang Jiangtao] Task Allocation in Mobile Crowd Sensing: State of the Art and Future Opportunities